Null models

Raup-Crick

null.rcq(obj, constrainingVar='None', randomization='frequency', weightingVar='None', weight=1, iterations=99, divType='naive', distmat='None', q=1, compareVar='None'):

The observed dissimilarities between samples are compared to a null distribution. The null model randomizes the count table to calculate a null expectation of the pairwise dissimilarities between samples. This is repeated several times (iterations) to get a null distribution. During the randomization, the total number of OTUs/ASVs and read count for each sample are kept constant, but the identity of the OTUs/ASVs and the distribution of reads between them are randomized. The function returns a python dictionary with several items:

  • ‘divType’ is information about the diversity type used in the calculations;
  • ‘obs_d’ is the actually observed dissimilarity values;
  • ‘null_mean’ is the mean values of the null dissimilarities (i.e. the dissimilarities of the randomized tables);
  • ‘null_std’ is the standard devation;
  • ‘p_index’ is the Raup-Crick measure constained between 0 and 1 (i.e. the fraction of times the observed dissimilarities are higher than the null expectation).
  • ‘ses’ is the standardized effect size (ses=(null_mean - obs_d)/null_std). A postive value means that the compared communities are more similar than expected and a negative value means they are more dissimilar.

obj is the qdiv object.

constrainingVar is a column heading in the meta data that can be used to constrain the randomizations so that read counts are only randomized with a certain category of samples.

randomization specifies the randomization procedure:

  • ‘abundance’ means that ASVs are drawn to each sample based on the total read counts in the frequency table (or part of the table defined by constrainingVar )
  • ‘frequency’ means that ASVs are drawn based on the number of samples in which they are detected. This method is the same as in Stegen et al. (2013). ISME Journal, 7(11), 2069-2079.
  • ‘weighting’ uses the abundance method but a meta data column (weightingVar ) can be used to categorize samples and the weight parameter decide the importance of the category of samples with the lowest richness. A weight of 0 means that the low-richness samples are not considered in the regional community used to populate the samples with read counts while a weight of 1 means that all sample groups have equal weighting.

iterations specifies the number of randomizations, 999 is the normal but could take several hours for large count tables.

divType specifies the dissimilarity index to calculate: ‘naive’, ‘phyl’, ‘func’, ‘Jaccard’, and ‘Bray’, are available choices. For ‘func’, distmat must be specified.

q is the diversity order if divType is ‘naive’, ‘phyl’, or ‘func’.

compareVar is a column heading in the meta data. If compareVar is not None, the mean and standard deviation of obs_d, p_index, and ses are returned in the output as obs_d_mean, obs_d_std, etc. They represent the mean and standard deviation of all pairwise comparison between the meta data categories specified under compareVar.

The index is inspired by the articles: Raup and Crick (1979), J Paleontology, 53(5), 1213-1227 and Chase et al. (2011), Ecosphere, 2(2), 24.

Phylogenetic null models

null.nriq(obj, distmat, q=1, iterations=99)

Calculates the Net Relatedness Index (NRI). This is done by calculating the Mean Phylogenetic Distance (MPD) between OTUs/ASVs in a sample, then randomizing the positions of the OTUs/ASVs at the end nodes in the phylogenetic tree and generating a null distribution of the MPD based on a large number of randomizations. See Webb et al. (2002) Annu. Rev. Ecol. Sys. 33, 475-505 (DOI: 10.1146/annurev.ecolsys.33.010802.150448). The NRI is defined as the standardized effect size:

  • NRI=(MPDnull_mean-MPDobs)/MPDnull_stdev.

Here, we have generalized NRI and MPD to all diversity orders (q). The function returns a dataframe with the following column headings: ‘MPDq’, ‘null_mean’, ‘null_std’, ‘p_index’, ‘ses’.

  • ‘p_index’ is a probability index constrained between 0 and 1. If it is close to 0, the sample is more phylogenetically clustered than expected by the null model. If it is close to 1, the sample is more phylogenetically disperse than expected by the null model.
  • ‘ses’ is the NRI index. If it is positive, the sample is more phylogenetically clustered than expected by null model. If it is negative, the sample is more phylogenetically disperse than expected by the null model.

obj is a qdiv object.

distmat is a pandas dataframe with pairwise distances between OTUs/ASVs. It can be generated by stats.sequence_comparison(obj, inputType=’tree’).

q is the diversity order.

iterations is the number of randomizations in the null model.

null.ntiq(obj, distmat, q=1, iterations=99)

Calculates the Nearest Taxon Index (NTI). This is done by calculating the Mean Nearest Taxon Distance (MNTD) between OTUs/ASVs in a sample, then randomizing the positions of the OTUs/ASVs at the end nodes in the phylogenetic tree and generating a null distribution of the MNTD based on a large number of randomizations. See Webb et al. (2002) Annu. Rev. Ecol. Sys. 33, 475-505 (DOI: 10.1146/annurev.ecolsys.33.010802.150448). The NTI is defined as the standized effect size:

  • NTI=(MNTDnull_mean-MNTDobs)/MNTDnull_stdev.

Here, we have generalized NTI and MNTD to all diversity orders (q). The function returns a dataframe with the following column headings: ‘MNTDq’, ‘null_mean’, ‘null_std’, ‘p_index’, ‘ses’.

  • ‘p_index’ is a probability index constrained between 0 and 1. If it is close to 0, the sample is more phylogenetically clustered than expected by null model. If it is close to 1, the sample is more phylogenetically disperse than expected by the null model.
  • ‘ses’ is the NTI index. If it is positive, the sample is more phylogenetically clustered than expected by null model. If it is negative, the sample is more phylogenetically disperse than expected by the null model.

obj is a qdiv object.

distmat is a pandas dataframe with pairwise distances between OTUs/ASVs. It can be generated by stats.sequence_comparison(obj, inputType=’tree’).

q is the diversity order.

iterations is the number of randomizations in the null model.

null.beta_nriq(obj, distmat, q=1, iterations=99)

Calculates the Net Relatedness Index between samples (beta NRI). This is done by calculating the Mean Phylogenetic Distance (beta MPD) between OTUs/ASVs in pairs of samples, then randomizing the positions of the OTUs/ASVs at the end nodes in the phylogenetic tree and generating a null distribution of the beta MPD based on a large number of randomizations. The beta NRI is defined as the standized effect size the same way as the NRI index is defined above. See Fine and Kembel (2011) Ecography 34, 552-565 (DOI: 10.1111/j.1600-0587.2010.06548.x). Here, we have generalized beta NRI and beta MPD to all diversity orders (q). The function returns a dictionary with the following dataframes: ‘beta_MPDq’, ‘null_mean’, ‘null_std’, ‘p_index’, ‘ses’. The dataframes have pairwise values between samples.

  • ‘p_index’ is a probability index constrained between 0 and 1. If it is close to 0, the sample is more phylogenetically clustered than expected by null model. If it is close to 1, the sample is more phylogenetically disperse than expected by the null model.
  • ‘ses’ is the beta NRI index. If it is positive, the sample is more phylogenetically clustered than expected by null model. If it is negative, the sample is more phylogenetically disperse than expected by the null model.

obj is a qdiv object.

distmat is a pandas dataframe with pairwise distances between OTUs/ASVs. It can be generated by stats.sequence_comparison(obj, inputType=’tree’).

q is the diversity order.

iterations is the number of randomizations in the null model.

null.beta_ntiq(obj, distmat, q=1, iterations=99)

Calculates the Nearest Taxon Index between samples (beta NTI). This is done by calculating the Mean Nearest Taxon Distance (beta MNTD) between OTUs/ASVs in pairs of samples, then randomizing the positions of the OTUs/ASVs at the end nodes in the phylogenetic tree and generating a null distribution of the beta MNTD based on a large number of randomizations. The beta NTI is defined as the standized effect size the same way as the NTI index is defined above. See Fine and Kembel (2011) Ecography 34, 552-565 (DOI: 10.1111/j.1600-0587.2010.06548.x). Here, we have generalized beta NTI and beta MNTD to all diversity orders (q). The function returns a dictionary with the following dataframes: ‘beta_MNTDq’, ‘null_mean’, ‘null_std’, ‘p_index’, ‘ses’. The dataframes have pairwise values between samples.

  • ‘p_index’ is a probability index constrained between 0 and 1. If it is close to 0, the sample is more phylogenetically clustered than expected by null model. If it is close to 1, the sample is more phylogenetically disperse than expected by the null model.
  • ‘ses’ is the beta NTI index. If it is positive, the sample is more phylogenetically clustered than expected by null model. If it is negative, the sample is more phylogenetically disperse than expected by the null model.

obj is a qdiv object.

distmat is a pandas dataframe with pairwise distances between OTUs/ASVs. It can be generated by stats.sequence_comparison(obj, inputType=’tree’).

q is the diversity order.

iterations is the number of randomizations in the null model.